The Food Hygiene Rating Scheme (FHRS) dataset for Camden offers a rich and comprehensive reservoir of information pertaining to the hygiene and operational aspects of food establishments in the borough. As we embark on this Exploratory Data Analysis (EDA), our goal is to unveil intricate patterns, trends, and noteworthy observations embedded within the dataset. This exploration not only seeks to provide a holistic understanding of the local food scene but also to extract valuable insights that can contribute to informed decision-making, policy formulation, and enhanced consumer awareness.I chose this specific dataset as Camden, a vibrant borough in London known for its diverse culinary landscape, is home to a myriad of food establishments, ranging from traditional cafes and restaurants to contemporary takeaway joints and specialty retailers. The FHRS dataset encapsulates crucial details such as hygiene scores, business types, geographical locations, and other relevant metrics, making it a valuable resource for dissecting the dynamics of the local food industry.At the heart of this exploration lies the recognition that the FHRS dataset is not merely a compilation of raw numbers; rather, it serves as a potent tool for generating actionable insights. The intricate interplay between hygiene scores, business types, and geographical distributions unfolds a narrative that extends beyond immediate statistical interpretations. This narrative, driven by data, holds the potential to contribute significantly to informed decision-making, aid in the formulation of effective policies, and elevate consumer awareness.
Reports of mouse droppings, flies on pizzas, and out-of-date food inside Camden’s zero and one-star restaurants, as detailed in a news article (https://www.hamhigh.co.uk/news/21348470.mouse-droppings-flies-pizzas-out-of-date-food---inside-camdens-zero-one-star-restaurants/), are alarming indicators of potential hygiene issues and food safety violations. The presence of such problems in restaurants can pose serious health risks for customers. Recognizing the urgency and importance of addressing these issues, I am prioritizing an analysis to identify various restaurants with both low and high ratings, investigate their geospatial locations, and specifically assess pubs and bars for hygiene and overall ratings. Importantly, there has been no previous analysis conducted on this crucial topic, emphasizing the need for a comprehensive examination to ensure public health and safety.
The FHRS dataset for Camden was obtained from the UK government website (https://www.data.gov.uk/dataset/55022d4a-b796-46db-a7f7-c4bd800aad9a/food-hygiene-rating-scheme-camden). Selecting a government website lends credibility to the data source, considering the stringent standards and regulations associated with government datasets. My personal interest in foods and restaurants, coupled with my experience as an international student, adds a valuable perspective to the selection process. Camden, being a vibrant culinary hub, presents an ideal setting for analysis. I chose the CSV format for my dataset, which was also available in XML, JSON, and RDF formats. This decision was based on my familiarity and comfort with working with CSV. Additionally, selecting CSV provides versatility for other users who may prefer different formats, ensuring accessibility and ease of use for a broader audience.
Acknowledging the primary disadvantage of the dataset, the prevalence of missing values poses a significant challenge that can impact the integrity of the entire analysis. The potential repercussions of imputing incorrect values in place of the missing ones are substantial, as they have the capacity to alter the perceived image of individual restaurants. This inherent risk underscores the need for a cautious and thoughtful approach to handling missing data, as any inaccuracies introduced during the imputation process could potentially skew the results and mislead interpretations.
%matplotlib inline
#Importing libraries that are used in this project
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import plotly.offline as pyo
import folium
from folium.plugins import MarkerCluster
import json
from folium.features import GeoJson, GeoJsonPopup
# Reading Chicago Crimes data from a CSV file into a DataFrame
df = pd.read_csv('Food_Hygiene_Rating_Scheme_Camden.csv')
df
| Business Name | Address Line 1 | Address Line 2 | Address Line 3 | Postcode | Business Type ID | Business Type Description | Food Hygiene Rating Scheme ID | Food Hygiene Rating Scheme Type | Hygiene Score | ... | Ward Code | Ward Name | Easting | Northing | Longitude | Latitude | Spatial Accuracy | Last Uploaded | Location | Organisation URI | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | SUSHI DAILY | NaN | 246 High Holborn | NaN | WC1V 7EX | 4613 | Retailers - other | 1653945 | FHRS | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Unknown | 26/11/2023 | NaN | http://opendatacommunities.org/id/london-borou... |
| 1 | DELICIOUSLY ELLA | NaN | 250 Tottenham Court Road | NaN | W1T 7QZ | 4613 | Retailers - other | 1567445 | FHRS | 0.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | Unknown | 26/11/2023 | NaN | http://opendatacommunities.org/id/london-borou... |
| 2 | KINGS CROSS P BUILDING CAFE | Meta, Kings Cross P Building | 12 Lewis Cubitt Square | NaN | N1C 4DR | 1 | Restaurant/Cafe/Canteen | 1453210 | FHRS | 0.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | Unknown | 26/11/2023 | NaN | http://opendatacommunities.org/id/london-borou... |
| 3 | SRILANKAN FOOD | NaN | Chalton Street Market | NaN | NW1 1JH | 4613 | Retailers - other | 1616574 | FHRS | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | Unknown | 26/11/2023 | NaN | http://opendatacommunities.org/id/london-borou... |
| 4 | FISH PLAICE | NaN | 32 Museum Street | NaN | WC1A 1LH | 7844 | Takeaway/sandwich shop | 1492507 | FHRS | 5.0 | ... | E05013653 | Bloomsbury | 530118.0 | 181556.0 | -0.126070 | 51.517940 | Unknown | 26/11/2023 | (51.51794, -0.12607) | http://opendatacommunities.org/id/london-borou... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3829 | Fruit and Vegetables Corner of Oxford Arms | NaN | Islip Street | NaN | NW5 2DJ | 4613 | Retailers - other | 424726 | FHRS | 0.0 | ... | E05013664 | Kentish Town South | 529100.0 | 185029.0 | -0.139447 | 51.549386 | Unknown | 26/11/2023 | (51.549386, -0.139447) | http://opendatacommunities.org/id/london-borou... |
| 3830 | FUNKY CHIPS | Unit 705, The Stables Market | Chalk Farm Road | NaN | NW1 8AH | 7844 | Takeaway/sandwich shop | 1374077 | FHRS | 5.0 | ... | E05013655 | Camden Town | 528546.0 | 184231.0 | -0.147738 | 51.542340 | Unknown | 26/11/2023 | (51.54234, -0.147738) | http://opendatacommunities.org/id/london-borou... |
| 3831 | CANDY VIBES FOR YOU LTD | NaN | NaN | NaN | NaN | 4613 | Retailers - other | 1527880 | FHRS | 5.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | Unknown | 26/11/2023 | NaN | http://opendatacommunities.org/id/london-borou... |
| 3832 | EGGLA | NaN | 44 Chalk Farm Road | NaN | NW1 8AJ | 4613 | Retailers - other | 1380775 | FHRS | 10.0 | ... | E05013660 | Haverstock | 528485.0 | 184302.0 | -0.148591 | 51.542992 | Unknown | 26/11/2023 | (51.542992, -0.148591) | http://opendatacommunities.org/id/london-borou... |
| 3833 | The Convenience Store | NaN | 63 St Giles High Street | NaN | WC2H 8LE | 4613 | Retailers - other | 952255 | FHRS | 10.0 | ... | E05013662 | Holborn and Covent Garden | 530013.0 | 181279.0 | -0.127684 | 51.515475 | Unknown | 26/11/2023 | (51.515475, -0.127684) | http://opendatacommunities.org/id/london-borou... |
3834 rows × 30 columns
#Displays the columns of the dataset
df.columns
Index(['Business Name', 'Address Line 1', 'Address Line 2', 'Address Line 3',
'Postcode', 'Business Type ID', 'Business Type Description',
'Food Hygiene Rating Scheme ID', 'Food Hygiene Rating Scheme Type',
'Hygiene Score', 'Structural Score', 'Confidence In Management Score',
'Rating Value', 'Rating Date', 'New Rating Pending',
'Local Authority Business ID', 'Local Authority Code',
'Local Authority Name', 'Local Authority Email Address',
'Local Authority Website', 'Ward Code', 'Ward Name', 'Easting',
'Northing', 'Longitude', 'Latitude', 'Spatial Accuracy',
'Last Uploaded', 'Location', 'Organisation URI'],
dtype='object')
#Displays number of rows and columns in the dataset
df.shape
(3834, 30)
# Identifying the numeric columns
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
numeric_df = df.select_dtypes(include=numerics)
len(numeric_df.columns)
12
#This generates the descriptive statistics for each numerical column
df.describe()
| Address Line 3 | Business Type ID | Food Hygiene Rating Scheme ID | Hygiene Score | Structural Score | Confidence In Management Score | Local Authority Business ID | Local Authority Code | Easting | Northing | Longitude | Latitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 0.0 | 3834.000000 | 3.834000e+03 | 3291.000000 | 3291.000000 | 3291.000000 | 3834.000000 | 3834.0 | 3422.000000 | 3422.000000 | 3422.000000 | 3422.000000 |
| mean | NaN | 4024.980438 | 1.047931e+06 | 4.539654 | 5.504406 | 5.721665 | 139453.720918 | 506.0 | 528954.768556 | 183316.972238 | -0.142184 | 51.534033 |
| std | NaN | 3543.641317 | 4.563708e+05 | 4.416657 | 4.286924 | 4.855387 | 78711.930386 | 0.0 | 1629.282072 | 1494.711161 | 0.023094 | 0.013699 |
| min | NaN | 1.000000 | 4.236590e+05 | 0.000000 | 0.000000 | 0.000000 | 16.000000 | 506.0 | 523976.000000 | 180938.000000 | -0.213138 | 51.512406 |
| 25% | NaN | 1.000000 | 4.264245e+05 | 0.000000 | 5.000000 | 0.000000 | 60467.750000 | 506.0 | 528426.750000 | 181873.000000 | -0.148897 | 51.520718 |
| 50% | NaN | 4613.000000 | 1.140803e+06 | 5.000000 | 5.000000 | 5.000000 | 195852.000000 | 506.0 | 529261.000000 | 183438.500000 | -0.137934 | 51.535036 |
| 75% | NaN | 7843.750000 | 1.453214e+06 | 5.000000 | 10.000000 | 10.000000 | 200783.250000 | 506.0 | 530145.000000 | 184516.250000 | -0.125506 | 51.545314 |
| max | NaN | 7846.000000 | 1.676120e+06 | 25.000000 | 25.000000 | 30.000000 | 203623.000000 | 506.0 | 532012.000000 | 187472.000000 | -0.097915 | 51.571527 |
#displays columns of dataset and their types
df.dtypes
Business Name object Address Line 1 object Address Line 2 object Address Line 3 float64 Postcode object Business Type ID int64 Business Type Description object Food Hygiene Rating Scheme ID int64 Food Hygiene Rating Scheme Type object Hygiene Score float64 Structural Score float64 Confidence In Management Score float64 Rating Value object Rating Date object New Rating Pending bool Local Authority Business ID int64 Local Authority Code int64 Local Authority Name object Local Authority Email Address object Local Authority Website object Ward Code object Ward Name object Easting float64 Northing float64 Longitude float64 Latitude float64 Spatial Accuracy object Last Uploaded object Location object Organisation URI object dtype: object
#Used for checking null values and returns the result as boolean
df.isna()
| Business Name | Address Line 1 | Address Line 2 | Address Line 3 | Postcode | Business Type ID | Business Type Description | Food Hygiene Rating Scheme ID | Food Hygiene Rating Scheme Type | Hygiene Score | ... | Ward Code | Ward Name | Easting | Northing | Longitude | Latitude | Spatial Accuracy | Last Uploaded | Location | Organisation URI | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | False | True | False | True | False | False | False | False | False | True | ... | True | True | True | True | True | True | False | False | True | False |
| 1 | False | True | False | True | False | False | False | False | False | False | ... | True | True | True | True | True | True | False | False | True | False |
| 2 | False | False | False | True | False | False | False | False | False | False | ... | True | True | True | True | True | True | False | False | True | False |
| 3 | False | True | False | True | False | False | False | False | False | True | ... | True | True | True | True | True | True | False | False | True | False |
| 4 | False | True | False | True | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3829 | False | True | False | True | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 3830 | False | False | False | True | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 3831 | False | True | True | True | True | False | False | False | False | False | ... | True | True | True | True | True | True | False | False | True | False |
| 3832 | False | True | False | True | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 3833 | False | True | False | True | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
3834 rows × 30 columns
# Calculating the number of missing values for each column and sorting in descending order
miss_values = df.isna().sum().sort_values(ascending=False)
miss_values
Address Line 3 3834 Address Line 1 2641 Hygiene Score 543 Structural Score 543 Confidence In Management Score 543 Rating Date 461 Ward Name 422 Ward Code 422 Latitude 412 Longitude 412 Location 412 Northing 412 Easting 412 Address Line 2 122 Postcode 121 Local Authority Website 0 Spatial Accuracy 0 Last Uploaded 0 Business Name 0 Local Authority Business ID 0 Local Authority Email Address 0 Local Authority Name 0 Local Authority Code 0 New Rating Pending 0 Rating Value 0 Food Hygiene Rating Scheme Type 0 Food Hygiene Rating Scheme ID 0 Business Type Description 0 Business Type ID 0 Organisation URI 0 dtype: int64
miss_values.plot(kind='barh',color='red')
<Axes: >
df.isna().sum().sort_values(ascending=False)
Address Line 3 3834 Address Line 1 2641 Hygiene Score 543 Structural Score 543 Confidence In Management Score 543 Rating Date 461 Ward Name 422 Ward Code 422 Latitude 412 Longitude 412 Location 412 Northing 412 Easting 412 Address Line 2 122 Postcode 121 Local Authority Website 0 Spatial Accuracy 0 Last Uploaded 0 Business Name 0 Local Authority Business ID 0 Local Authority Email Address 0 Local Authority Name 0 Local Authority Code 0 New Rating Pending 0 Rating Value 0 Food Hygiene Rating Scheme Type 0 Food Hygiene Rating Scheme ID 0 Business Type Description 0 Business Type ID 0 Organisation URI 0 dtype: int64
#Removing duplicate columns in the dataset
df.drop_duplicates(inplace=True)
# missing values in numeric columns are filled with '0'.
numeric_columns = ['Address Line 3', 'Business Type ID', 'Food Hygiene Rating Scheme ID','Hygiene Score',
'Structural Score', 'Confidence In Management Score',
'Local Authority Business ID', 'Local Authority Code', 'Easting',
'Northing', 'Longitude', 'Latitude','Location']
for col in numeric_columns:
df[col]=df[col].fillna(0)
# missing values in string columns are filled with 'unkown'.
string_columns = ['Business Name', 'Address Line 1', 'Address Line 2', 'Postcode',
'Business Type Description', 'Food Hygiene Rating Scheme Type',
'Rating Value', 'Rating Date', 'Local Authority Name',
'Local Authority Email Address', 'Local Authority Website', 'Ward Code', 'Spatial Accuracy', 'Last Uploaded',
'Organisation URI']
for col in string_columns:
df[col]=df[col].fillna("unknown")
df['Ward Name'] = df['Ward Name'].bfill()
df.isna().sum().sort_values(ascending=False)
Business Name 0 Address Line 1 0 Location 0 Last Uploaded 0 Spatial Accuracy 0 Latitude 0 Longitude 0 Northing 0 Easting 0 Ward Name 0 Ward Code 0 Local Authority Website 0 Local Authority Email Address 0 Local Authority Name 0 Local Authority Code 0 Local Authority Business ID 0 New Rating Pending 0 Rating Date 0 Rating Value 0 Confidence In Management Score 0 Structural Score 0 Hygiene Score 0 Food Hygiene Rating Scheme Type 0 Food Hygiene Rating Scheme ID 0 Business Type Description 0 Business Type ID 0 Postcode 0 Address Line 3 0 Address Line 2 0 Organisation URI 0 dtype: int64
This visualization offers a rapid overview of the various business types present in the dataset along with their respective counts. It proves useful for comprehending the composition and diversity of businesses in the dataset, enabling us to select a specific category for more in-depth analysis.
# Assuming df is your DataFrame
business_type_counts = df['Business Type Description'].value_counts()
# Create a figure and axis
fig, ax = plt.subplots(figsize=(12, 6))
# Bar Plot
business_type_counts.plot(kind='bar', color='coral', ax=ax)
ax.set_title('Business Type Distribution')
ax.set_xlabel('Business Type')
ax.set_ylabel('Count')
# Table
table_data = pd.DataFrame({'Business Type': business_type_counts.index, 'Count': business_type_counts.values})
table = ax.table(cellText=table_data.values, colLabels=table_data.columns, cellLoc='center', loc='bottom', bbox=[0, -1.20, 1, 0.5])
# Adjust table font size
table.auto_set_font_size(False)
table.set_fontsize(10)
plt.show()
The bar chart illustrates that the 'Business/Cafe/Canteen' category exhibits the highest count, establishing itself as the predominant business type in the dataset. Following closely is the Takeaway/Sandwich shop, with Retailer-other trailing behind.
This Plotly Express visualization presents a dynamic exploration of the top hygiene scores across different business types in the Food Hygiene Rating Scheme (FHRS) dataset. Leveraging the power of interactive plotting, the script identifies and highlights the top 5 hygiene scores for each unique 'Hygiene Score' category within the dataset. The color-coded bars, differentiated by business types, provide a visually engaging representation of the distribution of top hygiene scores, offering valuable insights into the performance of different food establishments in Camden.
import plotly.express as px
# Assuming your DataFrame is named df
# Get the top 5 values for each 'Hygiene Score'
top5_values = df.groupby('Hygiene Score').head(5)
# Create an interactive bar plot using Plotly
fig = px.bar(top5_values, x='Business Type Description', y='Hygiene Score', color='Business Type Description',
title='Top 5 Hygiene Scores by Business Type', labels={'Hygiene Score': 'Hygiene Score'},)
# Show the plot
fig.show()
Indeed, the presented Plotly Express bar chart distinctly illustrates that the 'Restaurant/Cafe/Canteen' business type dominates the top hygiene scores, ranging from 0 to 25. This valuable insight suggests an opportunity for further exploration and in-depth surveys to pinpoint specific restaurants within this category. Identifying the names and locations of these establishments can provide a more granular understanding of their hygiene practices and potentially uncover patterns or trends that contribute to their high scores. Such detailed investigations can pave the way for targeted improvements, best practices dissemination, and enhanced transparency, ultimately fostering a safer and more informed food environment in Camden.
We are narrowing our analysis to the 'Restaurant/Cafe/Canteen' category, specifically emphasizing high hygiene scores such as 25 and 20. By identifying the business names associated with these exemplary hygiene scores, our aim is to provide valuable information to consumers, empowering them to make informed choices about dining options that prioritize hygiene. Given the pivotal role of hygiene in ensuring food quality and, consequently, the well-being of individuals, this analysis serves as a crucial resource. Beyond benefiting consumers, it also contributes to enhancing the visibility and appreciation of these restaurants, recognizing and promoting their commitment to maintaining high hygiene standards.
# Filter DataFrame for the specific business type and multiple Hygiene Scores
selected_business_type = 'Restaurant/Cafe/Canteen'
hygiene_scores = [25,20]
filtered_df = df[(df['Business Type Description'] == selected_business_type) & (df['Hygiene Score'].isin(hygiene_scores))]
# Create a pivot table for the heatmap
pivot_df = filtered_df.pivot_table(index='Business Name', columns='Business Type Description', values='Hygiene Score', aggfunc='mean', fill_value=0)
# Heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(pivot_df, annot=True, cmap='viridis', fmt='g')
plt.title(f'Heatmap for {selected_business_type} with Hygiene Scores {", ".join(map(str, hygiene_scores))}')
plt.xlabel('Business Type Description')
plt.ylabel('Business Name')
plt.show()
The heatmap reveals that five restaurants namely BEBEK MANGEL, David's Deli, NE ZHA, Redemption Roasters, TARIM have achieved the exceptional hygiene score of 25, reflecting their steadfast commitment to ensuring the health and well-being of their patrons. These establishments exhibit a noteworthy dedication to maintaining high hygiene standards, a fact corroborated by the thorough assessments conducted by food safety officers. Given Camden's status as a global attraction, drawing visitors from around the world, the availability of hygienic food becomes paramount. The credibility of these hygiene scores, earned through meticulous inspections, positions these restaurants as trustworthy choices for individuals seeking a dining experience prioritizing cleanliness and safety.
This interactive scatter plot investigates high-scoring restaurants in the 'Restaurant/Cafe/Canteen' category, with a particular emphasis on those with a hygiene score of 25. The dynamic representation of the spatial distribution and clustering of Camden's top-performing restaurants is provided by the chart, which smoothly blends the fields of "Ward Name," "Business Name," and "Postcode." Plotting each marker as a distinct establishment based on postcode allows for better delineation. This multivariate chart offers an easy-to-use interface for exploring the categories and geographic features of these excellent restaurants.
# Assuming your DataFrame is named df
filtered_data = df[(df['Hygiene Score'] == 25) & (df['Business Type Description'] == 'Restaurant/Cafe/Canteen')]
# Create an interactive scatter plot with Plotly Express
fig = px.scatter(filtered_data,
x='Ward Name',
y='Business Name',
color='Postcode', # Use 'Postcode' for color-coding
size=[8]*len(filtered_data), # Set the size to a constant value (e.g., 10)
labels={'Ward Name': 'Ward Name', 'Business Name': 'Business Name'},
title='Multivariate Chart for Hygiene Score 25, Restaurant/Cafe/Canteen',
template='plotly_dark')
# Show the interactive chart
fig.show()
Beyond merely acknowledging top ratings, our focus extends to identifying the 'Ward Name' and 'Postcode,' allowing us to guide individuals not only to the highest-rated establishments but also to specific locations associated with these exemplary dining experiences. This nuanced approach acknowledges the potential variations in taste and hygiene standards among different branches of the same restaurant, ensuring that patrons receive accurate guidance. Notably, the chart highlights that NE ZHA and TARIM, both outstanding in terms of hygiene, share the same geographical area, Holborn and Covent Garden. This insight underscores the significance of considering location alongside ratings for a more informed dining choice.
This interactive Folium map showcases the geospatial distribution of distinguished restaurants falling within the 'Restaurant/Cafe/Canteen' category and boasting hygiene scores between 20 and 25. Centered around the UK, with a zoom level optimized for clear visualization, the map employs Marker Clusters to enhance marker grouping and overall map readability. Each marker on the map represents a high-performing restaurant, with information including the business name and associated ward name displayed in a popup. By providing a visual representation of these well-rated establishments across the Camden area, this map serves as a valuable tool for users seeking to explore and make informed dining choices based on both hygiene standards and geographic proximity.
# Filter the data based on the specified criteria
filtered_data = df[(df['Hygiene Score'].between(20, 25)) & (df['Business Type Description'] == 'Restaurant/Cafe/Canteen')]
# Create a Folium map centered around the UK
uk_map = folium.Map(location=[51, 0], zoom_start=8)
# Create a MarkerCluster to group markers for better visualization
marker_cluster = MarkerCluster().add_to(uk_map)
# Add markers for each business
for index, row in filtered_data.iterrows():
folium.Marker(
location=[row['Latitude'], row['Longitude']],
popup=f"{row['Business Name']} - {row['Ward Name']}",
icon=None, # You can customize the icon if needed
).add_to(marker_cluster)
# Display the Folium map
uk_map
This Sunburst Chart delves into the hygiene landscape of restaurants categorized as 'Restaurant/Cafe/Canteen' in Camden, specifically focusing on establishments with a hygiene score of 0. The chart provides a visually intuitive exploration of the ward-wise distribution and business names of selected restaurants with low hygiene scores.
# Filter the data based on the specified criteria
filtered_data = df[(df['Hygiene Score'] == 0) & (df['Business Type Description'] == 'Restaurant/Cafe/Canteen')]
# Sort the data based on some criteria (e.g., alphabetical order of business name)
sorted_data = filtered_data.sort_values(by='Business Name').head(10)
# Create a sunburst chart using plotly.express
fig = px.sunburst(sorted_data, path=['Ward Name', 'Business Name'], title='Sunburst Chart for Hygiene Score 0, Restaurant/Cafe/Canteen (Top 5)')
# Show the chart
fig.show()
This analysis holds paramount significance as the hygiene score directly impacts human health, making it a crucial aspect for consideration. Customers invest their trust, payment, and well-being in these restaurants, emphasizing the necessity for a high standard of food quality. The insights derived from this analysis not only benefit the public by guiding them towards establishments with better hygiene but also provide an opportunity for restaurants to identify areas for improvement. Additionally, food safety officers can leverage this information for targeted inspections, contributing to overall enhanced food safety standards in the dining establishments of Camden.
For this analysis, we focus on Pub/Bar/Nightclub establishments, comparing them based on the new column 'Rating Value,' which represents customer ratings. To handle the vast dataset, a Seaborn stripplot is employed. This visualization provides a snapshot of business types, their respective ratings, and their distribution across different wards.
In the resulting stripplot, each dot signifies a specific business, and distinctive colors indicate the corresponding ward. Given the large dataset, we strategically select the top 5-7 values for each rating category. This approach allows us to condense the information and offer a meaningful representation of key ratings for Pub/Bar/Nightclubs. The visualization not only highlights these prominent ratings but also provides a comprehensive overview of their distribution across various wards. This nuanced insight is invaluable for businesses, policymakers, and stakeholders seeking to understand the performance of Pub/Bar/Nightclubs in the specified area.
# Replace infinite values with NaN in the entire DataFrame
df.replace([np.inf, -np.inf], np.nan, inplace=True)
# Convert 'Rating Value' to numeric (if it's not already)
df['Rating Value'] = pd.to_numeric(df['Rating Value'], errors='coerce')
# Filter rows with non-null 'Rating Value' and specific business type
filtered_df = df[(df['Business Type Description'] == 'Pub/bar/nightclub') & df['Rating Value'].notnull()]
# Get the top 5-7 values for each 'Rating Value'
top_values = filtered_df.groupby('Rating Value').head(7)
# Plot using Seaborn with unique color for each 'Ward Name'
plt.figure(figsize=(12, 8))
sns.stripplot(
data=top_values,
x='Business Type Description',
y='Rating Value',
hue='Ward Name',
palette='viridis',
jitter=True,
dodge=True,
size=12, # Adjust the size of each dot
alpha=0.7
)
# Customize the plot
plt.title('Top 5-7 Ratings for Pub/Bar/Nightclub')
plt.xlabel('Business Type')
plt.ylabel('Rating Value')
# Show the legend outside the plot
plt.legend(title='Ward Name', bbox_to_anchor=(1.05, 1), loc='upper left')
# Display the plot
plt.show()
C:\Users\sanjith\anaconda3\Lib\site-packages\seaborn\_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. C:\Users\sanjith\anaconda3\Lib\site-packages\seaborn\_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
Exploring the distribution of ratings for Pub/Bar/Nightclub establishments reveals interesting patterns. While 0-rated pubs are concentrated in Bloomsbury and Kings Cross, 5-rated pubs/bars are spread across various locations such as Kings Cross, Regents Park, and Bloomsbury. To conduct a more in-depth analysis, we narrow our focus to establishments with rating values of 0 and 5. By isolating the best and worst-rated pubs, we aim to gain deeper insights into the factors influencing customer perceptions in different areas.
We are individually selecting Rating Values 0 and 5, along with the corresponding ward names where these ratings occur. This information is presented in a FacetGrid to enhance clarity, facilitating a deeper exploration to identify business names associated with these specific ratings.
# Filter the data for 'Pub/bar/nightclub' business type and Rating Values '0' and '5'
pub_data_filtered = df[(df['Business Type Description'] == 'Pub/bar/nightclub') & (df['Rating Value'].isin([0, 5]))]
# Get the top 5 ward names based on count
top_5_wards = pub_data_filtered['Ward Name'].value_counts().nlargest(5).index
# Filter the data for the top 5 ward names
pub_data_filtered_top5 = pub_data_filtered[pub_data_filtered['Ward Name'].isin(top_5_wards)]
# Create a FacetGrid for each Rating Value
g = sns.FacetGrid(pub_data_filtered_top5, col='Rating Value', col_wrap=2, height=5, palette='viridis')
# Map a count plot to each facet
g.map(sns.countplot, 'Ward Name', palette='plasma', order=top_5_wards)
# Customize the charts
g.set_titles("Rating Value {col_name}")
g.set_axis_labels('Ward Name', 'Count')
# Rotate x-axis labels for better readability
g.set_xticklabels(rotation=45, ha='right')
# Display the charts
plt.tight_layout()
plt.show()
# Create tables for each facet
for rating_value in [0, 5]:
table_data = pub_data_filtered_top5[pub_data_filtered_top5['Rating Value'] == rating_value].pivot_table(
index='Ward Name',
values='Business Name',
aggfunc='count'
)
print(f'\nTable for Rating Value {rating_value}:\n{table_data.to_markdown()}')
Table for Rating Value 0: | Ward Name | Business Name | |:------------|----------------:| | Bloomsbury | 1 | | Kings Cross | 1 | Table for Rating Value 5: | Ward Name | Business Name | |:--------------------------|----------------:| | Bloomsbury | 33 | | Camden Town | 14 | | Holborn and Covent Garden | 42 | | Kings Cross | 11 | | Regents Park | 14 |
The FacetGrid plot offers a comprehensive and visually intuitive overview of Pub/Bar/Nightclub ratings in London, specifically focusing on Rating Values '0' and '5'. This analysis reveals distinct geographic patterns, indicating which wards predominantly house the best-rated ('5') and worst-rated ('0') establishments. Additionally, the count of pubs/bars within each rating category provides a quantitative understanding of the prevalence of these ratings in different areas. This information is crucial for stakeholders, businesses, and policymakers, enabling them to identify specific wards where interventions or improvements may be needed and offering valuable insights for strategic decision-making in the food industry.
In this geospatial analysis, our focus is on Pub/Bar/Nightclub establishments in London. The goal is to visualize the distribution of ratings across various locations, providing insights into the performance of these businesses. The map uses color-coded markers to represent different ratings: establishments with a rating of '5' are marked in green, those with ratings from '2-4' in yellow, and those with ratings '0' and '1' in red. This visualization offers a quick overview of the best and worst-rated establishments in specific areas. Explore the map to gain insights into the geographic patterns of these ratings and their potential implications for businesses and consumers.
filtered_data_top5 = df[(df['Business Type Description'] == 'Pub/bar/nightclub') & (df['Rating Value'].isin([0, 1, 5]))]
# Create a base map centered around a location (e.g., Kings Cross)
map_center = [51.5326, -0.1240]
mymap = folium.Map(location=map_center, zoom_start=14)
# Create a MarkerCluster for better visualization of multiple markers
marker_cluster = MarkerCluster().add_to(mymap)
# Add markers to the map for each business
for index, row in filtered_data_top5.iterrows():
# Assuming your DataFrame has 'Latitude' and 'Longitude' columns
lat, lon = row['Latitude'], row['Longitude']
# Create a popup with information
popup_text = f"Business Name: {row['Business Name']}<br>Rating Value: {row['Rating Value']}"
# Customize the marker color based on Rating Value
marker_color = 'green' if row['Rating Value'] == 5 else 'red' if row['Rating Value'] in [0, 1] else 'yellow'
# Create a marker with a circle representing the bubble
folium.CircleMarker(
location=[lat, lon],
radius=10,
color=marker_color,
fill=True,
fill_color=marker_color,
fill_opacity=0.6,
popup=popup_text
).add_to(marker_cluster)
# Customize the map appearance
folium.TileLayer('openstreetmap').add_to(mymap) # Change basemap to OpenStreetMap
folium.LayerControl().add_to(mymap) # Add layer control for different basemaps
# Save or display the map
mymap.save('customized_map.html')
mymap
Throughout this project, we encountered and overcame various challenges to derive meaningful insights from Camden's FHRS dataset. A notable difficulty was the presence of missing or incomplete data in fields such as hygiene scores, Rating values, Address, geographical coordinates, etc., hindering a comprehensive understanding of hygiene standards and spatial distribution. To address this, we employed data imputation techniques, minimizing the impact of missing values. Despite these challenges, the analysis provided valuable insights. Business types analysis highlighted the dominance of 'Business/Cafe/Canteen,' while the heatmap showcased top performers like BEBEK MANGEL and NE ZHA, guiding targeted improvements. Geospatial analysis emphasized the need to consider location alongside ratings for informed dining choices, as illustrated in the Folium map. The project's significance lies in promoting public health, guiding consumers to establishments with better hygiene, and aiding food safety officers in targeted inspections. Exploring Pub/Bar/Nightclub ratings unveiled distinct geographic patterns for Rating Values '0' and '5,' offering crucial insights for stakeholders and policymakers. The FacetGrid plot provided a comprehensive overview, identifying wards needing interventions in the food industry. In conclusion, this project, despite its challenges, serves as a valuable resource for fostering a safer and more transparent food environment in Camden, contributing to enhanced food safety standards, informed decision-making, and overall improved well-being for both businesses and consumers.
The exploration of Camden's FHRS dataset aimed to provide a comprehensive understanding of the local food scene, extracting valuable insights for informed decision-making and enhanced consumer awareness. Throughout the process, the challenges of missing or incomplete data were acknowledged, prompting the use of data imputation techniques with a mindful approach to minimize potential biases. The analysis successfully revealed patterns in business types, highlighted top performers, and showcased geographic distributions.
An honest reflection recognizes the impact of missing values on the analysis, with an emphasis on the cautious handling of imputation to mitigate risks of misinterpretation. The project's significance lies in its contribution to public health, guiding consumers to establishments with better hygiene, aiding food safety officers, and fostering a safer food environment in Camden.
Future directions could involve refining data collection methods to reduce missing values, exploring advanced imputation techniques, and conducting in-depth investigations into the factors influencing high hygiene scores. Moreover, extending the analysis to encompass a temporal dimension could uncover trends or changes in hygiene standards over time. This project serves as a valuable foundation for ongoing research to continuously improve the understanding of food establishments in Camden and beyond.
%%js
// Run this cell to update your word count.
function wordcount() {
let wordCount = 0;
let extraCount = 0;
let mainBody = true;
let cells = Jupyter.notebook.get_cells();
cells.forEach((cell) => {
if (cell.cell_type == 'markdown') {
let text = cell.get_text();
// Stop counting as main body when getting to References or Appendices
if (text.startsWith('## References') || text.startsWith('## Appendices')) {
mainBody = false;
}
if (text.startsWith('## Word Count')) {
text = '';
}
if (text && mainBody) {
let words = text.toLowerCase().match(/\b[a-z\d]+\b/g);
if (words) {
let cellCount = words.length;
wordCount += cellCount;
}
} else if (text) {
let words = text.toLowerCase().match(/\b[a-z\d]+\b/g);
if (words) {
let cellCount = words.length;
extraCount += cellCount;
}
}
}
});
return [wordCount, extraCount];
}
let wc = wordcount();
element.append(`Main word count: ${wc[0]} (References and appendices word count: ${wc[1]})`);